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ABSTRACT 

We propose an algorithm for separating the foreground 
(mainly text and line graphics) from the smoothly varying 
background in screen content images. The proposed method 
is designed based on the assumption that the background part 
of the image is smoothly varying and can be represented by a 
linear combination of a few smoothly varying basis functions, 
while the foreground text and graphics create sharp disconti¬ 
nuity and cannot be modeled by this smooth representation. 
The algorithm separates the background and foreground us¬ 
ing a least absolute deviation method to fit the smooth model 
to the image pixels. This algorithm has been tested on several 
images from HEVC standard test sequences for screen con¬ 
tent coding, and is shown to have superior performance over 
other popular methods, such as k-means clustering based seg¬ 
mentation in DjVu and shape primitive extraction and coding 
(SPEC) algorithm. Such background/foreground segmenta¬ 
tion are important pre-processing steps for text extraction and 
separate coding of background and foreground for compres¬ 
sion of screen content images. 

1. INTRODUCTION 

Screen content images refer to images appearing on the dis¬ 
play screens of electronic devices. These images share sim¬ 
ilar characteristics as mixed content documents (such as a 
magazine page). They often contain two layers, pictorial 
background and the foreground consisting of text and line 
graphics. The usual image compression algorithm such as 
JPEG2000 Q and HEVC intra frame coding O may not 
result in a good compression rate for this kind of images. In 
these cases, segmenting the image into two layers and coding 
them separately may be more efficient. The idea of seg¬ 
menting an image for better compression was proposed for 
check image compression 0, in Dj Vu algorithm for scanned 
document compression El and the mixed raster content rep¬ 
resentation 0. 

Screen content images are hard to segment, because the 
foreground may be overlaid over a smoothly varying back¬ 
ground that has a color range that overlaps with the color of 
the foreground. Also because of the use of sub-pixel render¬ 
ing, the same text/line often has different colors. Even in the 


absence of sub-pixel rendering, pixels belonging to the same 
text/line often has somewhat different colors. 

Most of previous works regarding foreground segmenta¬ 
tion are based on color clustering or color counting and have 
difficulty for cases where the background color has a large dy¬ 
namic range or is similar to the foreground color in some re¬ 
gions. The hierarchical k-means clustering algorithm initially 
proposed in DjVu HI is a representative algorithm based on 
color clustering. This algorithm applies the k-means cluster¬ 
ing algorithm with k=2 on blocks in multi-resolution. It first 
applies the k-means clustering algorithm on a large block to 
obtain foreground and background colors and then uses them 
as the initial foreground and background colors for the smaller 
blocks in the next stages. This algorithm has difficulty for the 
regions where background and foreground color intensities 
overlap. 

In the shape primitive extraction and coding (SPEC) 
method, which was developed for segmentation of screen 
content (6), a two-step segmentation algorithm is proposed. 
In the first step they classify each block of size 16 x 16 into 
either pictorial block or text/graphics based on the number of 
colors. If the number of colors is more than a threshold, it will 
be classified into pictorial block, otherwise to text/graphics. 
In the second step, they refine the segmentation result of 
pictorial blocks, by extracting shape primitives (horizontal 
line, vertical line or a rectangle with the same color) and then 
comparing the size and color of the shape primitives with 
some threshold. Because blocks containing smoothly vary¬ 
ing background over a narrow range can also have a small 
color number, it is hard to find a fixed color number threshold 
that can robustly separate pictorial blocks and text/graphics 
blocks. Eurthermore, text and graphics in screen content im¬ 
ages typically have some variation in their colors, even in the 
absence of sub-pixel rendering. These challenges limit the 
effectiveness of SPEC. 

In this paper, we propose a foreground/background sepa¬ 
ration algorithm which uses the smoothness property of the 
background and the fact that the foreground pixels typically 
deviate greatly from the smooth function fit to the back¬ 
ground. Using this intuition we propose to use least absolute 
deviation approach 171 to fit each image block using a smooth 
model. Those pixels which can be represented with small 


distortion using this smooth model will be considered as 
background and the rest as foreground. This technique can be 
used for screen content video coding 0, text extraction (91, 
medical image segmentation and classification cni, HD and 
principal line extraction from palmprint images (121, (IS- 

2. LEAST ABSOLUTE DEVIATION 

In this paper, we look at the foreground segmentation prob¬ 
lem from signal decomposition point of view. We assume 
that the background part of the image can be well represented 
with a simple smooth model, whereas the foreground pixels 
cannot be represented accurately with this smooth model. By 
well representation we mean that the distortion between the 
approximated smooth model and the actual pixel values is 
less than a desired threshold. To be more specific, we divide 
each image into non-overlapping blocks of size NxN, and 
then represent each image block denoted by F(x, ^), with a 
smooth model 5'(x, ..., ax), where x and y denote the 

horizontal and vertical axes and o^i,..., ax denote the param¬ 
eters of this smooth model. For color images, F{x,y) repre¬ 
sents the luminance component. In order to find the optimal 
model parameters, a^s, we should define some cost function 
which measures the goodness of fit between the intensity of 
background pixels in the original image and the one predicted 
by smooth model, and then minimize the cost function as: 

{al,..., a*ji} =a.ig min \\F{x,y) - S{x,y;ai, ...,aK)\\p 

ai,...,aK 

Now two questions should be answered: 

1. What is an optimal smooth model for background layer 
representation. 

2. What error measure (i.e. p value) to use such that the 
model parameters are mainly found using the back¬ 
ground pixels. 

For the first question, we propose to use a linear combina¬ 
tion of K smooth basis functions Ylk=i ^kPk{x,y), where 
Pk{x, y) denotes a 2D smooth basis function. Here we use a 
set of low frequency two-dimensional DCT basis functions, 
since they have been shown to be very efficient for image rep¬ 
resentation O. The 2-D DCT function is defined as: 

Pu,v{x,y) = l3u/3vCos{{2xFl)7ru/2N)cos{{2yFl)7ru/2N) 

where u and v denote the frequency of the basis. We order all 
the possible basis functions in the conventional zig-zag order 
in the (u,v) plane, and choose the first K basis functions. We 
have found that K=10 leads to very good background repre¬ 
sentation for a variety of screen content images (with PSNR 
over 45dB), and additional bases do not lead to significant 
increase in the reconstruction quality. 


Using this linear model, we need to solve the following 
optimization problem to derive model parameters: 

K 

{al,...,ak} = axg min \\F{x,y) -'^akPk{x,y)\\p 

k=l 

We can also look at the ID version of this problem by con¬ 
verting the 2D blocks of size NxN into a vector of length 
N‘^, denoted by /, by concatenating the columns and denoting 
^kPk {x, y) as Pa where P is a matrix of size N‘^ x K 
in which the k-th column corresponds to the vectorized ver¬ 
sion of Pk{x^y) and a = [o^i, ...^axY• Then the problem 
can be formulated as: a* = argmiuo, ||/ — Pa\\p 

For the second question, we can use different distances 
between actual pixel values and approximated ones with the 
smooth model. As an example, by minimizing the 12 norm we 
will have the least-sqaure fitting problem. The least square 
fitting suffers from the fact that the model parameters, a, can 
be adversely affected by foreground pixels. In least-square 
fitting, by squaring the residuals, the larger residues will get 
larger weights in determining the model parameters. Because 
of that we propose to use least absolute deviation, which is 
more robust to outliers compared to least-square fitting and 
the model is less affected by outliers. Therefore we need to 
solve the following optimization problem: 

a* = argmin ||/ — Pa\\i (1) 

a 

Least absolute deviation problem does not have a closed form 
solution but it can be solved with iterative algorithms. Differ¬ 
ent algorithms can be used to solve this problem, such as al¬ 
ternating direction method of multipliers (ADMM) (TS], iter¬ 
ative reweighted least square fitting (T^ and linear program¬ 
ming. Here we use the ADMM algorithm. 

One alternative way to solve the second question is to use 
a robust regression approach to fit the smooth model into im¬ 
age blocks such that the model parameters are determined 
only using background pixels. One such a work is presented 
in ca where the author proposed to use RANSAC algorithm 
to fit the smooth model into the background pixels. 

2.1. ADMM formulation to solve least absolute deviation 

To solve (1) with ADMM, we introduce the auxiliary vari¬ 
able 2 ; = Pa — f and convert the original problem into the 
following form: 

minimize || 2 :||i 

z,a 

subject to Pa — z = f. 

Then we can use the following updates for each iteration in 
ADMM (HI: 

Q,fe+1 = (prp)-lp^(/ + - v!^) 

^'=+1 = - f + u '^) 

^fe+i = p^k+i _ ^k+i _ j 


where u denotes the dual variable, p is the augmented La- 
grangian parameter and Si/p denotes soft-thresholding oper¬ 
ator applied elementwise and is defined as: 

Si/p{x) = sign(a;)max(|a;| - l//>,0) 

2.2. Segmentation Algorithm 

We propose a segmentation algorithm which first checks if a 
block can be segmented using some simpler methods. These 
simple cases take care of two groups of blocks: completely 
fiat block and smoothly varying background without fore¬ 
ground. Completely fiat blocks are those in which all pixels 
have the same value and are common in screen content im¬ 
ages. Therefore they can be declared as background or fore¬ 
ground based on their neighboring blocks’ background color. 
For these blocks, if we could find at least one neighbor block 
with a background color close enough to the current block’s 
color (difference less than 62 ), it would be segmented as back¬ 
ground. Smoothly varying background without foreground is 
a block in which the intensity of all pixels can be modeled 
well by the smooth function. Therefore we try to fit K DCT 
basis to all pixels using least square fitting and if the inten¬ 
sity of all pixels can be predicted with distortion less than 63 , 
that block would be segmented entirely as background. We 
will apply the least absolute fitting only if a block does not 
satisfy these two conditions. Furthermore, at the end of the 
least absolute fitting, we check the percentage of identified 
background pixels. If the percent is less than a threshold, we 
divide the block into 4 smaller blocks and repeat the process. 
The overall segmentation algorithm is summarized below: 

1. If all pixels in the block have the same color intensity 
(i.e. it is completely fiat block), declare the entire block 
as background or foreground as explained above. If not, 
go to the next step; 

2. Perform least square fitting using the luminance val¬ 
ues of all pixels. If all pixels can be predicted with an 
absolute error less than 63 , declare the entire block as 
background. If not, go to the next step; 

3. Use least absolute deviation to fit a model to the lu¬ 
minance values of image block and find the absolute 
fitting error of all pixels using that model. Each pixel 
with a distortion less than a threshold ei will be con¬ 
sidered as background, otherwise as foreground. If the 
percentage of background pixels is more than 64 then 
stop, otherwise go to the next step; 

4. Decompose current block of size N x N into 4 smaller 

blocks of size ^ ^ T segmentation algo¬ 

rithm for all of them. Repeat until N = 8. 

The above algorithm makes an initial decision based on 
the luminance component of a block only. At the end of this 


process, we further use least squares fitting to find a smooth 
model for each of the two chrominance components (Cb and 
Cr) using the chrominance values at identified background 
pixels. If the fitting error for any color component is larger 
than ei for any pixel, that pixel is reclassified as foreground. 

3. RESULTS 

To enable rigorous evaluation of different algorithms, we have 
generated an annotated dataset consisting of 332 image blocks 
of size 64 x 64, extracted from sample frames from 5 HEVC 
test sequences for screen content coding. The ground truth 
foregrounds for these images are extracted manually. 

Before showing the results let us discuss about parameters 
of our algorithm. In our implementation, the block size is 
chosen to be A^=64 which is the same as the largest CU size 
in HEVC standard. The number of DCT basis functions, K, 
is chosen to be 10 based on the training images. The other 
parameters are chosen as ei = 10, 62 = 10, 63 = 3 and 
64 = 0.5, which have been found to perform well on a training 
set. Eor ADMM algorithm, we have used the implementation 
by Stephen Boyd ca. The number of iteration is chosen to 
be 200 and the parameter p is chosen as the default value, 1 . 

We compare the proposed algorithm with two algorithms; 
hierarchical k-means clustering in DjVu and shape primitive 
extraction and coding (SPEC) method. Eor SPEC, we have 
adapted the color number threshold and the shape primitive 
size threshold from the default value given in B when neces¬ 
sary to give more satisfactory result. Eurthermore, for blocks 
classified as text/graphics based on the color number, we seg¬ 
ment the most frequent color and any similar color to it (i.e. 
colors which their distance from most frequent color is less 
than 10 ) in the current block as background and the rest as 
foreground. 

To provide a numerical comparison between the proposed 
scheme and previous approaches, we have calculated the av¬ 
erage precision and recall achieved by different segmentation 
algorithms over this dataset. The average precision and re¬ 
call by different algorithms are given in Table 1. As it can be 
seen, the proposed scheme achieves a much higher precision 
and recall than other algorithms. 


Table 1: Accuracy comparison of different algorithms 


Performance 

Criteria 

SPEC 

Clustering in 
DjVu 

The proposed 
algorithm 

Precision 

0.5038 

0.6491 

0.9147 

Recall 

0.6458 

0.6909 

0.8773 


The results for 5 test images (each consisting of multiple 
64x64 blocks) are shown in Eig. 1. It can be seen that in 
all cases the proposed algorithm gives superior performance 
over DjVu and SPEC. Note that our dataset mainly consists 
of challenging images where the background and foreground 










If you need ht 
or ideas for a ph^ 

Ai’ Presentation 


Use sample ‘ ^ 

yourpres^"' ' 

' O^S'ign, organize, and collabora 


You don' 

If you need heln w 

orideasfbraphotc 

Auti Presentation 

Use sample temple 
your presentation 
content (you're 

now!) OeSlgn,organize,and collabon 



0 

||ifentation 

•V 





If you need help 

orideasforaphou 

further., ^ presentation 


Use sample temp 
your presentation 
content (yoii’t' 

now!) OeSign, organize, and collabon 



Fig. 1: Segmentation result for test images. The images in the first and second rows denote the original images and ground 
truth foregrounds. The images in the third, forth and the fifth rows denote the foreground map by shape primitive extraction and 
coding, hierarchical clustering in Dj Vu and the proposed algorithm respectively. 


have overlapping color ranges. For simpler cases where the 
background has a narrow color range that is quite different 
from the foreground, both Dj Vu and the proposed method will 
work well. On the other hand, SPEC does not work well when 
the background is fairly homogeneous within a block and the 
foreground text/lines have varying colors. 

4. CONCLUSION 

This paper proposed an algorithm for segmentation of screen 
content images into a foreground layer consisting of mainly 
text and lines and a background layer consisting of smoothly 
varying regions. We developed a least absolute deviation ap¬ 
proach to fit a smooth model into image blocks. A pixel is 
considered background if it can be represented accurately by 
the smooth model; otherwise it will be considered as fore¬ 
ground. Instead of applying this algorithm to every block, 
which is computationally demanding, we first check whether 
a block can be segmented using simpler methods. This helps 
to reduce the computation complexity. This algorithm has 


been tested on several test images and compared with two 
well-known algorithms for foreground segmentation, SPEC 
and hierarchical clustering in Dj Vu, and it shows significantly 
better performance for blocks where the background and fore¬ 
ground pixels have overlapping intensities. Note that the pro¬ 
posed algorithm is not limited to screen content image seg¬ 
mentation. It has other applications such as text extraction in 
images and principal line extraction in palmprint images. 
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